35 research outputs found
Neuromorphic deep convolutional neural network learning systems for FPGA in real time
Deep Learning algorithms have become one of the best approaches for pattern recognition in several fields, including computer vision, speech recognition, natural language processing, and audio recognition, among others. In image vision, convolutional neural networks stand out, due to their relatively simple supervised training and their efficiency extracting features from a scene. Nowadays, there exist several implementations of convolutional neural networks accelerators that manage to perform these networks in real time. However, the number of operations and power consumption of these implementations can be reduced using a different processing paradigm as neuromorphic engineering.
Neuromorphic engineering field studies the behavior of biological and inner systems of the human neural processing with the purpose of design analog, digital or mixed-signal systems to solve problems inspired in how human brain performs complex tasks, replicating the behavior and properties of biological neurons. Neuromorphic engineering tries to give an answer to how our brain is capable to learn and perform complex tasks with high efficiency under the paradigm of spike-based computation.
This thesis explores both frame-based and spike-based processing paradigms for the development of hardware architectures for visual pattern recognition based on convolutional neural networks. In this work, two FPGA implementations of convolutional neural networks accelerator architectures for frame-based using OpenCL and SoC technologies are presented. Followed by a novel neuromorphic convolution processor for spike-based processing paradigm, which implements the same behaviour of leaky integrate-and-fire neuron model. Furthermore, it reads the data in rows being able to perform multiple layers in the same chip. Finally, a novel FPGA implementation of Hierarchy of Time Surfaces algorithm and a new memory model for spike-based systems are proposed
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Deep-learning is a cutting edge theory that is being applied to many fields.
For vision applications the Convolutional Neural Networks (CNN) are demanding
significant accuracy for classification tasks. Numerous hardware accelerators
have populated during the last years to improve CPU or GPU based solutions.
This technology is commonly prototyped and tested over FPGAs before being
considered for ASIC fabrication for mass production. The use of commercial
typical cameras (30fps) limits the capabilities of these systems for high speed
applications. The use of dynamic vision sensors (DVS) that emulate the behavior
of a biological retina is taking an incremental importance to improve this
applications due to its nature, where the information is represented by a
continuous stream of spikes and the frames to be processed by the CNN are
constructed collecting a fixed number of these spikes (called events). The
faster an object is, the more events are produced by DVS, so the higher is the
equivalent frame rate. Therefore, these DVS utilization allows to compute a
frame at the maximum speed a CNN accelerator can offer. In this paper we
present a VHDL/HLS description of a pipelined design for FPGA able to collect
events from an Address-Event-Representation (AER) DVS retina to obtain a
normalized histogram to be used by a particular CNN accelerator, called
NullHop. VHDL is used to describe the circuit, and HLS for computation blocks,
which are used to perform the normalization of a frame needed for the CNN.
Results outperform previous implementations of frames collection and
normalization using ARM processors running at 800MHz on a Zynq7100 in both
latency and power consumption. A measured 67% speedup factor is presented for a
Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page
Efficient DMA transfers management on embedded Linux PSoC for Deep-Learning gestures recognition: Using Dynamic Vision Sensor and NullHop one-layer CNN accelerator to play RoShamBo
This demonstration shows a Dynamic Vision Sensor able
to capture visual motion at a speed equivalent to a highspeed
camera (20k fps). The collected visual information is presented as
normalized histogram to a CNN accelerator hardware, called
NullHop, that is able to process a pre-trained CNN to
play Roshambo against a human. The CNN designed for this
purpose consist of 5 convolutional layers and a fully connected
layer. The
latency for processing one histogram is 8ms. NullHop is deployed
on the FPGA fabric of a PSoC from Xilinx, the Zynq 7100, which
is based on a dual-core ARM computer and a Kintex-7 with 444K
logic cells, integrated in the same chip. ARM computer is running
Linux and a specific C++ controller is running the whole
demo. This controller runs at user space in order to extract the
maximum throughput thanks to an efficient use of the AXIStream,
based of
DMA transfers. This short delay needed to process one
visual histogram, allows us to average several consecutive
classification
outputs. Therefore, it provides the best estimation of the symbol
that the user presents to the visual sensor. This output is then
mapped to present the winner symbol within the 60ms latency
that the brain considers acceptable before thinking that there is a
trick.Ministerio de Economía y Competitividad TEC2016-77785-
Spiking row-by-row FPGA Multi-kernel and Multi-layer Convolution Processor.
Spiking convolutional neural networks have become
a novel approach for machine vision tasks, due to the latency
to process an input stimulus from a scene, and the low power
consumption of these kind of solutions. Event-based systems only
perform sum operations instead of sum of products of framebased
systems. In this work an upgrade of a neuromorphic
event-based convolution accelerator for SCNN, which is able to
perform multiple layers with different kernel sizes, is presented.
The system has a latency per layer from 1.44 μs to 9.98μs for
kernel sizes from 1x1 to 7x7
System based on inertial sensors for behavioral monitoring of wildlife
Sensors Network is an integration of multiples
sensors in a system to collect information about different
environment variables. Monitoring systems allow us to
determine the current state, to know its behavior and
sometimes to predict what it is going to happen. This work
presents a monitoring system for semi-wild animals that
get their actions using an IMU (inertial measure unit) and
a sensor fusion algorithm. Based on an ARM-CortexM4
microcontroller this system sends data using ZigBee
technology of different sensor axis in two different
operations modes: RAW (logging all information into a SD
card) or RT (real-time operation). The sensor fusion
algorithm improves both the precision and noise
interferences.Junta de Andalucía P12-TIC-130
A Sensor Fusion Horse Gait Classification by a Spiking Neural Network on SpiNNaker
The study and monitoring of the behavior of wildlife has always been
a subject of great interest. Although many systems can track animal positions
using GPS systems, the behavior classification is not a common task. For this
work, a multi-sensory wearable device has been designed and implemented to be
used in the Doñana National Park in order to control and monitor wild and semiwild
life animals. The data obtained with these sensors is processed using a
Spiking Neural Network (SNN), with Address-Event-Representation (AER)
coding, and it is classified between some fixed activity behaviors. This works
presents the full infrastructure deployed in Doñana to collect the data, the wearable
device, the SNN implementation in SpiNNaker and the classification
results.Ministerio de Economía y Competitividad TEC2012-37868-C04-02Junta de Andalucía P12-TIC-130
Interfacing PDM sensors with PFM spiking systems: application for Neuromorphic Auditory Sensors
In this paper we present a sub-system to convert
audio information from low-power MEMS microphones with
pulse density modulation (PDM) output into rate coded spike
streams. These spikes represent the input signal of a Neuromorphic
Auditory Sensor (NAS), which is implemented with Spike
Signal Processing (SSP) building blocks. For this conversion, we
have designed a HDL component for FPGA able to interface
with PDM microphones and converts their pulses to temporal
distributed spikes following a pulse frequency modulation (PFM)
scheme with an accurate configurable Inter-Spike-Interval. The
new FPGA component has been tested in two scenarios, first as a
stand-alone circuit for its characterization, and then it has been
integrated with a full NAS design to verify its behavior. This
PDM interface demands less than 1% of a Spartan 6 FPGA
resources and has a power consumption below 5mW.Ministerio de Economía y Competitividad TEC2016-77785-
Live Demonstration: Neuromorphic Row-by-Row Multi-convolution FPGA Processor-SpiNNaker architecture for Dynamic-Vision Feature Extraction
In this demonstration a spiking neural network
architecture for vision recognition using an FPGA spiking
convolution processor, based on leaky integrate and fire neurons
(LIF) and a SpiNNaker board is presented. The network has
been trained with Poker-DVS dataset in order to classify the
four different card symbols. The spiking convolution processor
extracts features from images in form of spikes, computes by
one layer of 64 convolutions. These features are sent to an
OKAERtool board that converts from AER to 2-7 protocol
to be classified by a spiking neural network deployed on a
SpiNNaker platform
Event-based Row-by-Row Multi-convolution engine for Dynamic-Vision Feature Extraction on FPGA
Neural networks algorithms are commonly used to
recognize patterns from different data sources such as audio or
vision. In image recognition, Convolutional Neural Networks are
one of the most effective techniques due to the high accuracy they
achieve. This kind of algorithms require billions of addition and
multiplication operations over all pixels of an image. However,
it is possible to reduce the number of operations using other
computer vision techniques rather than frame-based ones, e.g.
neuromorphic frame-free techniques. There exists many neuromorphic
vision sensors that detect pixels that have changed
their luminosity. In this study, an event-based convolution engine
for FPGA is presented. This engine models an array of leaky
integrate and fire neurons. It is able to apply different kernel
sizes, from 1x1 to 7x7, which are computed row by row, with a
maximum number of 64 different convolution kernels. The design
presented is able to process 64 feature maps of 7x7 with a latency
of 8.98 s.Ministerio de Economía y Competitividad TEC2016-77785-
Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
Deep learning has significantly advanced the state of the
art in artificial intelligence, gaining wide popularity from both industry
and academia. Special interest is around Convolutional Neural Networks
(CNN), which take inspiration from the hierarchical structure
of the visual cortex, to form deep layers of convolutional operations,
along with fully connected classifiers. Hardware implementations of these
deep CNN architectures are challenged with memory bottlenecks that
require many convolution and fully-connected layers demanding large
amount of communication for parallel computation. Multi-core CPU
based solutions have demonstrated their inadequacy for this problem
due to the memory wall and low parallelism. Many-core GPU architectures
show superior performance but they consume high power and also
have memory constraints due to inconsistencies between cache and main
memory. OpenCL is commonly used to describe these architectures for
their execution on GPGPUs or FPGAs. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy
using embedded parallel BlockRAMs. This boosts the parallel use
of shared memory elements between multiple processing units, avoiding
data replicability and inconsistencies. This makes FPGAs potentially
powerful solutions for real-time classification of CNNs. In this
paper both Altera and Xilinx adopted OpenCL co-design frameworks
for pseudo-automatic development solutions are evaluated. A comprehensive
evaluation and comparison for a 5-layer deep CNN is presented.
Hardware resources, temporal performance and the OpenCL architecture
for CNNs are discussed. Xilinx demonstrates faster synthesis, better
FPGA resource utilization and more compact boards. Altera provides
multi-platforms tools, mature design community and better execution
times.Ministerio de Economía y Competitividad TEC2016-77785-